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Multiple variants aggregate in the neuregulin signaling 
pathway in a subset of schizophrenia patients 

A Hatzimanolis 1 , JA McGrath 1 , R Wang 1 , T Li 2 , PC Wong 2 , G Nestadt 1 , PS Wolyniec 1 , D Valle 3 , AE Pulver 1 and D Avramopoulos 1,3 

Despite the strongly held view that schizophrenia (SZ) shows substantial genetic heterogeneity, pathway heterogeneity, as seen 
in cancer where different pathways are affected in similar tumors, has not been explored. We explore this possibility in a case- 
only study of the neuregulin signaling pathway (NSP), which has been prominently implicated in SZ and for which there is 
detailed knowledge on the ligand- and receptor-processing steps through p- and y-secretase cleavage. We hypothesize that 
more than one damaging variants in the NSP genes might be necessary to cause disease, leading to an apparent clustering of 
such variants in only the few patients with affected NSP. We analyze linkage and next-generation sequencing results for the 
genes encoding components of the pathway, including NRG1, NRG3, ERBB4, p-secretase and the y-secretase complex. We find 
multiple independent examples of supporting evidence for this hypothesis: (i) increased linkage scores over NSP genes, (ii) 
multiple positive interlocus correlations of linkage scores across families suggesting each family is linked to either many or 
none of the genes, (iii) aggregation of predicted damaging variants in a subset of individuals and (iv) significant phenotypic 
differences of the subset of patients carrying such variants. Collectively, our data strongly support the hypothesis that the NSP is 
affected by multiple damaging variants in a subset of phenotypically distinct patients. On the basis of this, we propose a general 
model of pathway heterogeneity in SZ, which, in part, may explain its phenotypic variability and genetic complexity. 
Translational Psychiatry (2013) 3, e264; doi:1 0.1 038/tp.201 3.33; published online 28 May 2013 



Introduction 

Schizophrenia (SZ) is a common and disabling mental 
disorder with complex inheritance and an estimated herit- 
ability exceeding 80%. 1 Genome-wide association studies are 
beginning to identify common genetic variation accounting for 
a small fraction of the genetic risk and support the existence of 
a very large number of susceptibility loci. 2-5 Many studies 
have demonstrated a strong selective disadvantage for SZ, 
the patients having on average half the number of offspring 
than the general population, 6 " 8 yet the frequency of the 
disease persists high at 1%. Not surprisingly, studies also 
implicate de novo mutations in SZ, 9-15 which suggests 
constant replenishment of pathogenic alleles balancing the 
selection process. The frequent involvement of de novo 
mutations could be due, in part, to the large number of target 
genes involved in neurodevelopment. The development of 
the nervous system is an elaborate process involving the 
coordinated action of genes and their products to accomplish 
individual tasks such as the coordinated growth, migration and 
synapsing of neurons to form a system capable of cognitive 
processing. Although the biological processes accomplishing 
these tasks might be robust and tolerate mildly damaging 
variation, it is possible that accumulation of such variation in 
components of the same process, either by new mutation or 
by co-inheritance, can lead to significant neurodevelopmental 



defects and SZ. Although single deleterious mutations would 
be quickly cleared by selection, mildly deleterious variants that 
aggregate to cause disease would be resistant to selection 
because as their frequency becomes lower, their rate of 
co-occurrence is reduced exponentially and in most carriers 
they do not lead to a selective disadvantage. 

Studies in cancer have shown that mutations in genes 
encoding components of several different cell proliferation 
pathways can produce similar tumor types. 16 ' 17 Like cancer, 
SZ shows high phenotypic heterogeneity despite the com- 
monalities responsible for considering it a single disease 
entity. As many processes are involved in neurodevelopment, 
disruption of any one might be sufficient to cause SZ. Where in 
cancer a second hit often occurs as a somatic mutation, 18 in 
the case of SZ all deleterious variants would need to be 
cotransmitted (which might include germline de novo events). 
For variants with relatively low allele frequencies, for example, 
below 10-15% each, this would be a rare event (for example, 
< 3% for two co-occurring variants, < 0.5% for three variants, 
and so on). In addition to their allowing persistence in the 
population, this could explain the small observed effect sizes, 
the small estimated fraction of explained heritability, 2 the 
extensive phenotypic heterogeneity of SZ and the difficulties 
in gene mapping. Our proposed model is illustrated in Figure 1 
and we refer to it as the pathway clustering model. What is 
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Canonical ^^^^ 
forward signaling ^^^^^^ 

Figure 1 Schematic of neuregulin-ERBB4 signaling showing the roles of the 
products of the selected genes. The y-secretase multiprotein complex is 
constructed from the products of the genes: PSEN1, PSEN2, APH1A, APH1B, 
PSENEN and NCSTN. The cleavage of NRG3 by BACE1 is hypothesized and has 
not been shown experimentally. 



important in testing this model is to know enough about the 
coordinated action of genes to perform a function to define the 
most concise group without missing important components. 

Since being associated with SZ in 2002, 19,20 NRG1 
(encoding neuregulin 1) is among the most widely examined 
genes showing converging evidence of association in multiple 
studies. 21-25 The evidence has been expanding to include 
other genes involved in neuregulin (NRG) signaling. Our 
group has previously reported that variation in NRG3 
(neuregulin 3), a paralog of NRG1, is associated with the 
development of delusional symptomatology in SZ patients, 26 
a finding replicated by other investigators. 27 ' 28 Accumulating 
evidence from association studies also involves ERBB4, the 
gene encoding the main receptor for NRG1 and NRG3, 29 ' 30 
and variation in ERBB4 has been reported to interact with 
variation in NRG1 increasing susceptibility to SZ. 31 ' 32 

NRGs are a family of signaling molecules with extracellular 
epidermal growth factor domains that bind to ERBB receptors 
to modulate neuronal migration and differentiation. 33 Two 
members, NRG1 and NRG3, have been linked to SZ and they 
share the receptor ERBB4. Studies on NRG1 have shown that 
once produced it is inserted in the cell membrane, where it can 
either signal solely to adjacent cells or be released through 
cleavage by (3-secretase 34 (Figure 1). Following binding of the 
NRG1 epidermal growth factor domain, ERBB4 is either 
autophosphorylated or transphosphorylated (canonical for- 
ward signaling) or cleaved by y-secretase to release the 
C-terminal intracellular domain, which can translocate to the 
nucleus and regulate transcription (non-canonical forward 
signaling). It has also been shown that NRG1 itself can be 
cleaved by y-secretase with the C-terminal intracellular 
domain migrating to the nucleus in canonical backward 
signaling. 35-37 The components and relationships of the key 
players in the NRG signaling pathway (NSP) are illustrated in 
Figure 1 . The 10 genes encoding these NSP components are 
the focus of our study. Evidence from mouse models supports 
our choice of genes and a role of (3- and y-secretase in SZ 



through NRG signaling. Mice lacking BACE1, encoding for 
(3-secretase, exhibit SZ-like phenotypes. 38 Similarly, deficiency 
of APH1B, a component of y-secretase, also causes 
neurobehavioral abnormalities similar to SZ. 39 In addition, 
altered levels of specific NRG1 mRNA transcripts and protein 
fragments have been reported in the prefrontal cortex and 
hippocampus of SZ patients, providing more evidence for 
dysregulated NRG1 processing. 40-42 We note that NRG3 has 
not yet been shown to be processed through the same 
mechanisms as NRG1 ; however, we include it in our analysis 
because of its close relationship to NRG1 and its involvement 
in SZ. 

Here, we explore our hypothesis of a pathway clustering 
model on the NSP, taking a narrow and stringent definition of 
pathway that includes only components known to be 
connected in a well-defined biological process. Predictions 
stemming from this model include that: (i) evidence of linkage 
(allele sharing) over the genes of the pathway will be weak but 
consistently elevated; (ii) families linked to one locus will also 
be linked to the other loci encoding components of the 
pathway; (iii) patients with damaging DNA variants in one 
gene will more often also have damaging variants in other 
genes of the same pathway; and (iv) the subgroup of patients 
with the NSP pathway affected will differ in phenotype from 
other patients. 

Patients and methods 

SZ pedigrees. We examined 123 multiplex families (that is, 
more than one affected family member) of European 
Caucasian background for which genome-wide linkage data 
are available and reported previously 43 Details on the 
recruitment procedures are described in Blouin et al. 44 All 
individuals were interviewed using either a modified 
Diagnostic Interview Schedule 45 or the Diagnostic Interview 
for Genetic Studies (DIGS). 46 Diagnoses were made through 
a consensus procedure using the Diagnostic and Statistical 
Manual of Mental Disorders, 4th edition (DSM-IV) consensus 
checklist criteria. We also evaluated nine factorial pheno- 
types derived from a principal component phenotypic 
analysis of 73 indicators, which has been previously 
described in detail 47 These nine factors include the often- 
reported delusions, hallucinations, disorganization, negative, 
and affective factors and novel factors such as child/ 
adolescent sociability, scholastic performance, disability/ 
impairment and prodromal. All nine factors demonstrate 
significant familiality, with the highest scores for disability/ 
impairment (0.61), disorganization (0.60) and scholastic 
performance (0.51) 47 

The Johns Hopkins institutional review board has approved 
all recruitment methods and protocols for collection of clinical 
data and blood samples for this study and informed consent 
has been obtained from all individuals. 

Linkage data analysis. Genotyping, quality control proce- 
dures and linkage analysis results have been described 
previously 43 We genotyped the 6008 single-nucleotide 
polymorphisms (SNPs) in the lllumina BeadArray Linkage 
mapping panel version 4.0 (lllumina, San Diego, CA, USA) at 
the Center for Inherited Disease Research at Johns Hopkins. 
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We calculated the non-parametric linkage (NPL) scores 
across the genome for each family and for the whole set of 
pedigrees using the analysis software MERLIN. 48 Sample- 
wide NPL scores across the genome were used to extract the 
scores at the location of each of the 10 genes of interest and 
to obtain their average, which was then compared with the 
average scores of 100000 random draws of 10 loci from 
across the genome. This method, which we have described 
and validated in simulated and real data, tests whether a set 
of loci is enriched for true linkage signals, regardless of their 
individual statistical significance. 49 For each gene, we 
identified the SNP within or closest to the gene in the linkage 
panel, which has a density of about one SNP every 500 kb, 
and the NPL score reported at that location was used for the 
analysis. 

We used the individual family NPL scores at each of the 10 
loci to examine pairwise correlations between linkage scores. 
These involved 123 cases (families) and 10 variables (loci). 
We hypothesized that if multiple hits in a pathway are 
involved, the families that show linkage at one locus are also 
more likely to show linkage at another, while the remaining 
families will show linkage at none. We considered NPL scores 
between -0.5 and +0.5 as uninformative and coded them 
as 0, otherwise positive scores were coded as 1 (evidence for 
linkage) and negative scores as - 1 (evidence against 
linkage), regardless of their actual magnitude to remove 
biases due to family size. Owing to the lack of normal 
distribution, Kendall's tau (t) coefficient was used to calculate 
linkage correlations between loci. The same was used to 
calculate correlations between the number of variants at one 
gene and the number of variants at the remaining pathway 
genes. These P-values are one-tailed since only positive 
correlations are consistent with the model and were validated 
by 1 000 000 permutations of the variant carriers for each 
gene (see Supplementary Material). To assess correlations 
between quantitative factor phenotypes and the load of 
variants in the NSP, we used a generalized linear model that 
included family ID as a categorical factor. This was preferred 
because we have shown that these factors show significant 
familial aggregation 47 These tests were two-tailed as there is no 
a priori expected direction. All statistical analysis was performed 
using the software Statistica (StatSoft, Tulsa, OK, USA). 



Sample pooling and exon sequencing. We sequenced 
pairs of distant affected relatives, so that in addition to 
assessing clustering of variants in individuals, we could 
assess whether the same pathway is perturbed across the 
extended family. On the basis of the availability of affected 
distant relatives and regardless of linkage to the loci, we 
selected 24 pairs of relatives (48 SZ cases), the closest 
relationship being avuncular (25% expected genome shar- 
ing). The expected identity by descent due to relatedness 
was 0.25 for 12 pairs, 0.125 for 9 pairs, 0.0625 for 2 pairs and 
0.03125 for 1 pair. We used next-generation technology to 
sequence all 120 protein coding exons (including alternative 
exons) of NRG1 and NRG3, the NRG receptor gene ERBB4, 
the BACE1 gene, which encodes (3-secretase enzyme, and 
the six genes that encode the components of the y-secretase 
multiprotein complex (PSEN1, PSEN2, PSENEN, NCSTN, 
APH1A and APH1B), totaling 38.9 kb of genomic sequence. 
We sequenced amplified DNA from pools of four subjects (12 
pools in total) following the procedures reported by Niranjan 
et al. 50 with minor modifications. We aligned the reads to the 
reference human genome sequence (hg19 build) using the 
Bowtie alignment program, 51 and subsequently processed, 
sorted and indexed using SAMtools. 52 Variant identification 
was performed for each pool using the Integrated Genomics 
Viewer software (IGV 2.1 , http://www.broadinstitute.org/igv/). 
We validated all identified coding variants and de-convoluted 
the pools by Sanger sequencing (detailed procedures in 
Supplementary Material). 



Results 

Linkage over the 10 loci comprising the NSP. We have 
previously demonstrated that testing the average linkage 
score across multiple loci and comparing it to the average of 
randomly selected loci is a powerful approach to show 
enrichment in true loci 49 According to the pathway clustering 
model, we expect such enrichment across the 10 NSP 
genes. The individual NPL scores from our 123 multiplex 
pedigrees are as follows: BACE1, 1.92; APH1A, 1.70; 
PSENEN, 1.21; NCSTN, 1.17; APH1B, 0.96; NRG1, 0.92; 
NRG3, 0.12; ERBB4, -0.25; PSEN2, -0.44; and PSEN1, 



Table 1 Pairwise correlations between linkage signals across 123 multiplex schizophrenia families 



GENE 


NCSTN 


PSEN2 


APH1A 


PSEN1 


APH1B 


PSENEN 


BACE1 


NRG1 


NRG3 


ERBB4 


NCSTN 




0.17 




0.12 


0.15 


0.05 


0.17 


-0.06 


0.04 


0.08 


PSEN2 


0.005 




0.12 


0.04 


0.00 


0.12 


0.07 


-0.01 


-0.17 


0.06 


APH1A 




0.046 




0.05 


0.07 


-0.06 


0.13 


0.00 


0.08 


0.07 


PSEN1 
APH1B 
PSENEN 


0.046 
0.014 








0.07 


0.03 
0.02 


-0.09 
-0.06 
0.07 


0.05 
-0.03 
-0.05 


-0.01 
0.01 
-0.13 


0.02 
-0.01 
-0.01 


BACE1 

NRG1 

NRG3 


0.006 


0.005 


0.028 






0.036 




-0.03 


0.03 
-0.05 


0.01 
0.03 
0.00 



ERBB4 



Kendall's t coefficient is shown above the diagonal, with nominally significant correlations underlined and those that are positive in bold. Nominally significant 
P-values are below the diagonal and if the correlation is positive they are in bold. The P-value for the linked loci NCSTN-APH1 A has been removed. 
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-0.59. The overall average is 0.67. To assess the 
significance of this combined linkage, we performed 
100000 draws of 10 loci at random from the same genome 
scan providing an empirical P= 0.067, which, although not 
formally reaching significance, suggests that many of the 10 
NSP genes may carry SZ risk alleles. 

Correlation of linkage scores across SZ families. 

According to our pathway clustering model, the subset of 
families with disrupted NSP function will show linkage to 
many NSP loci, while the other families to none, generating 
correlations of linkage scores between NSP loci. All pairwise 
correlations of linkage scores between the 10 NSP loci in our 
data are shown in Table 1 . Two genes, NCSTN and APH1A, 
are located <50cM apart on the same chromosome; 
therefore; correlation of linkage is expected on this basis 
and should be disregarded (the P-value has been removed 
from Table 1). Of the 44 remaining comparisons, 28 showed 
positive and 16 negative correlations compared to an 
expected 22 of each (observed vs expected x 2 P=0.07). 
Two nominally significant negative correlations were 
observed, both for NRG3 (not different from the 1 .1 negative 
correlation expected by chance in 44 tests, underlined in 



Table 2 Correlations between the number of damaging variants in each gene, 
with the number of variants in the remaining NSP genes 



Individuals (N = 48) Families = 24) 





Kendall x 


P-value 


Kendall x 


P-value 






(one-tailed) 




(one-tailed) 


NRG 1 vs other 


0.23 


0.010 


0.31 


0.016 


APH1B vs other 


0.28 


0.002 


0.33 


0.011 


NRG3 vs other 


-0.17 


NA 


-0.24 


NA 


Excluding NRG3 variants 








NRG1 vs other 


0.34 


3.6x 10" 4 


0.53 


1.5x 10" 4 


APH1B vs other 


0.50 


2.8x 10" 7 


0.67 


2.5x 10" 6 



Abbreviations: NA, not applicable; NRG, neuregulin; NSP, neuregulin signaling 
pathway; SZ, schizophernia. 

Rare variant correlations for each gene with the remaining NSP genes based on 
48 SZ cases. To account for familial genotype correlations, results are also 
shown for the average variant load of each pair (24 observations). The lower two 
lines show the results after excluding NRG3 from the NSP (see text). 



Table 1). In contrast, six positive correlations were statisti- 
cally significant (Kendall's rank correlation P<0.05, bold and 
underlined in Table 1). The number of positive nominally 
significant results is almost sixfold higher than expected by 
chance (6 observed vs 1.1 expected; x 2 P=2.2x10 -6 ), 
consistent with our model. As it could be argued that the 
positional correlation between NCSTN and APH1A makes 
the observations non-independent, we repeated the analysis 
removing all comparisons involving APH1A and this enrich- 
ment remained significant (P=7.2 x 10~ 4 ). This indepen- 
dent evidence from two different approaches prompted us to 
search for the responsible variants by sequencing all coding 
exons of the 10 NSP genes. 

Clustering of damaging coding variants in the NSP. We 

applied next-generation sequencing on 24 pairs of distantly 
related affected subjects in 12 tagged pools of 4. We 
obtained on average 12 356105 100-bp single-end reads 
from each indexed library and more than 75% of the reads 
mapped successfully to the hg19 human reference genome 
using Bowtie default parameters. The mean coverage for 
coding variants across all 12 libraries was 21 613 (median 
21 504) reads, which resulted in an average read depth of 
2702 reads per allele. To complete our screen, we 
sequenced all exons with <20x coverage per individual 
allele (2.6% of all exons) by Sanger sequencing in the 
respective pools. 

All identified exonic variants are shown in Supplementary 
Table 2. We detected 13 non-synonymous missense variants 
in 15 of the 24 pairs examined (62.5%). We validated all 
variants and de-convoluted the pools by conventional Sanger 
sequencing. Twelve variants were present in dbSNP135 and 
one, located in APH1B gene, was novel (p.Val199Leu). 
Characterization of the functional impact of coding variants 
using PolyPhen-2 and SIFT algorithms 53,54 predicted that 7 
out of the 13 are potentially deleterious to protein function. All 
predicted damaging variants had minor allele frequencies 
below 0.12 except rs3924999 in NRG1, which had a minor 
allele frequency of 0.48. On the basis of our hypothesis as 
outlined in the introduction, we only considered for further 
analysis variants that had a minor allele frequency below 0. 1 2 
and were predicted to be damaging by at least one algorithm. 
These are shown in Supplementary Table 2. 



Table 3 Differences in symptom dimensions between those SZ cases that carry NSP damaging variants and those that do not 



Factor Number of NSP variants Excluding NRG3 NRG3 alone 





Direction 


P-value 


Direction 


P-value 


Direction 


P-value 


Negative symptoms 




0.586 




0.376 




0.623 


Affective symptoms 




0.982 




0.944 




0.859 


Disorganization 




0.420 




0.490 




0.677 


Prodromal signs 




0.706 




0.874 




0.613 


Delusions 




0.231 




0.521 




0.183 


Scholastic functioning 




0.791 




0.444 




0.386 


Child/adolescent sociability 


(-) 


0.034 


(-) 


0.015 


(-) 


0.998 


Impairment/disability 


(-) 


0.018 


(-) 


0.004 


( + ) 


0.850 


Hallucinations 


( + ) 


0.141 


( + ) 


0.028 


(-) 


0.380 



Abbreviations: NRG, neuregulin; NSP, neuregulin signaling pathway; SZ, schizophernia. 

The effect of NSP damaging variants on the phenotype is measured through correlations between NSP variant load and factorial dimension scores (see text). The 
direction of the effect is only shown for significant P-values which are shown in bold. 
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Unaffected 



Pathway clustering model 
Affected (1) 



Affected (2) 






# = damaging variant 



Figure 2 Pathway clustering model. Schematic representation of the proposed 
pathway clustering model for schizophrenia demonstrating how clustering of 
multiple protein damaging variants within different well-defined molecular pathways 
could represent an unrecognized factor of SZ risk and phenotypic heterogeneity. 



Our model predicts that potentially deleterious variants 
should cluster in certain SZ individuals/pedigrees, those 
where the disease is caused by disruption of the NSP. The 
question therefore is whether a SZ patient who has damaging 
variants in one NSP gene is more likely to also have damaging 
variants in the other NSP genes. We tested by Kendall's rank 
correlation analysis for correlations between the counts of 
variants in each gene compared with the count of variants in 
the remaining genes. This is a total of three tests, one for each 
of NRG 1, APH1B and NRG3; APH1A was not tested because 
only two individuals carried the identified mutation, but was 
included when testing other genes. We found significant 
correlations for NRG1 and APH1B (t = 0.23, P=0.01, and 
t = 0.28, P= 0.002, respectively, one-tailed). A/PG3showed a 
nonsignificant negative correlation. To account for within- 
family genotype correlations that might inflate the statistic 
(the 48 observations are not truly independent), we also 
performed the analysis on the average variant burden of the 
two relatives and the family as a unit. Despite reducing the 
observations to 24, we obtained similar strong results 
(Table 2). 

The lack of positive correlations for NRG3was of interest as 
the link of this gene with [3- and y-secretase processing has 
not been shown. It is also interesting that NRG3 6\6 not show 
positive linkage correlations with other genes, wherewas 
when we tried removing it from the set of genes tested for their 
NPL average, the permutation P-value improved (P= 0.053). 
Although NRG3 is a paralog of NRG1 and may share some 
pathway components, it may well function in a different or 
parallel pathway. For this reason, we performed the same 
analysis excluding NRG3. We observed marked improvement 
and high statistical significance of the correlations of variation 
in APH1B and NRG1 with the remaining genes (t = 0.67, 
P=2.5x10~ 6 , permutation P=1.7x10~ 4 ; t = 0.53, 
P=1.5x10~ 4 , permutation P=3.4x10~ 3 ). This result, 
which remains significant after Bonferroni-based correction 
for removing each of the genes and re-testing, strongly 
supports our hypothesis and has potential implications for the 
possible relationship of NRG3\o the NSP and its processing 
compared to NRG1. It is interesting to observe that while the 



carriers of the NRG3 variant do not have more variants in 
other NSP genes, in four of the six families both affected 
members carry the damaging variant, despite the expected 
identity by descent sharing being 25% or less. 

Phenotypic correlates of NSP damaging variants. The 

pathway clustering model predicts that individuals with 
different affected pathways might differ in phenotype. The 
sequencing results above allow us to identify individuals 
whose SZ is likely due to variants in the NSP. The same 
individuals have been studied in a phenotype factor analysis 
that we have performed and published, 47 each having 
available scores for nine quantitative factors reflecting 
distinct clinical dimensions. We tested whether the NSP 
mutational load correlates with the each of the nine factor 
scores, the scores that are calculated for each individual 
based on the presence or absence of the 73 phenotypic 
indicators and their loadings on each factor. Since we have 
shown that the factors show familial aggregation, 47 for these 
factors we used a generalized linear model that included 
family ID as a categorical predictor. Owing to the sequencing 
results above questioning the inclusion of NRG3 in the gene 
set, we performed this analysis with and without NRG3. We 
observed significant correlations with two of the nine factors 
that improved and increased to three when excluding NRG3 
variants (Table 3). The strongest result was for the 
impairment/disability factor (P= 0.004), with lower disability 
in carriers of NSP damaging variants compared to patients 
with other pathways presumably affected. Of note, the 
presence of NSP damaging variants also predicted an 
elevated hallucination score when NRG3 variants were not 
included in the analysis (P= 0.028). 

Discussion 

SZ is a complex multifactorial mental disorder likely caused by 
the synergistic effects of genetic and environmental influ- 
ences. 2 ' 55 As sample sizes have grown, genetic studies are 
detecting common polymorphic variants robustly associated 
with SZ in diverse ethnic populations, signifying that SZ is a 
highly polygenic disorder with multiple genetic loci involved. 2 ' 4 
In line with this notion, a recent hypothesis suggests that 
hundreds or even thousands of loci act together in the context 
of inter-related molecular pathways to confer liability to SZ. 56 
Here, we further propose that in individual cases SZ may be 
the result of variants accumulating and perturbing the function 
of only one or a few of the many critical pathways. The specific 
pathway(s) may differ between patients, but might be similar 
within families. This model combines pathway heterogeneity 
and a requirement for multiple hits to the pathway. Its main 
difference from models commonly used in sequencing studies 
for rare variants is the prediction of heterogeneity among 
patients, which increases power and can be explored in a 
patient-only design. It further proposes that in addition to the 
number of damaging genetic variants, their distribution in 
pathway gene-sets is crucial, which could explain the 
difficulties of association and sequencing studies in identifying 
causative genetic loci. Directly testing this model will soon 
become possible with extensive exome and genome sequen- 
cing data combined with detailed and reliable delineation of 
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the relevant pathways. In our study, we focus on one pathway 
well connected to SZ whose core components are well 
characterized and we consistently find support for the model. 

There are two important aspects of our results, one 
regarding the NSP and another regarding the possibility that 
this model can be generalized. As others have previously 
suggested through association studies, 25 ' 28 ' 57 we now show 
through different approaches that multiple members of the 
NSP are involved in SZ. The increased linkage scores across 
the NSP gene-set, the correlations between linkage scores, 
the aggregation of potentially deleterious variants in a subset 
of patients and families and the phenotypic differences of 
these individuals strongly support the idea that multiple hits in 
the NSP increase the risk of SZ in a subset of individuals and 
families. We point to a few specific variants that should be the 
subject of functional follow-up including their possible 
biological interactions. There is little doubt that these variants 
are only a subset and that others, including non-coding or 
synonymous functional variants, could be involved. Further, 
our list of NSP genes may be incomplete and it appears from 
our results that NRG3 should be removed. Having a reliable 
model will help identify such variants and new gene-set 
members in larger samples, having a significant impact on our 
knowledge of the role of the NSP in SZ etiology. It should be 
noted that our assumption of a linear relationship between 
NSP variant load and phenotype is admittedly an over- 
simplification, only applied for lack of enough data to explore 
the true underlying model. In our study 6 out of 24 families 
(25%) carry variants in multiple NSP genes and we find that 
carrying such variants leads to lower scores in the 'impair- 
ment/disability' phenotypic factor (higher functioning) and 
higher scores in the 'hallucinations' factor (more positive 
symptoms). This is of particular interest in view of results 
exploring NRG3 and showing an association with a subgroup 
of subjects with SZ who experience florid delusions and have 
relatively spared cognition 27 Although our factor scores are 
only a statistical construct and the factor names simply 
derived from highly loading phenotypic items, there is a clear 
parallel emerging suggesting the NSP as a pathway asso- 
ciated with positive symptoms, relatively sparing cognition 
and retaining functional capacity. 

The second aspect of these data is that they are consistent 
with a broad model of pathway heterogeneity and damaging 
variant clustering for SZ illustrated in Figure 2. This pathway 
clustering model can be exploited for the analysis of forth- 
coming large-scale sequencing data on SZ cases and 
controls. Although we have not shown that other pathways 
behave in the same way, the NSP provides a paradigm. In 
fact, a likely scenario might involve more than one biological 
process per pathway and more than one pathway per affected 
individual, in a nested clustering model that future large 
patient sequencing data sets will make possible to explore. 
Today few other pathways combine the connections to SZ and 
the biological knowledge available for NRG. Possible candi- 
dates include RELN, 58 DISC1 -interacting proteins 59 and 
pathways emerging from large collaborative genome-wide 
association studies. 60 Statistical models based on the path- 
way heterogeneity hypothesis applied on forthcoming larger 
data sets with increased statistical power can test networks 
of genes revealed from proteomics, 61 gene expression 



experiments and gene ontology databases. Exploratory 
analyses could help refine these networks, greatly improving 
our knowledge of the disease. 

While our results are positive and consistent, it is important 
to keep in mind the limitations of each approach. Linkage 
information is well known to be inherently noisy, especially 
when the number of families is relatively small. While our test 
across loci approaches significance, the individual NPL 
scores should be considered with caution. The same is true 
for positive correlations between linkage scores, which are 
found in highly significant excess but are hard to assess 
individually. Further, it cannot be assumed that we detected all 
deleterious variants or all the variants we analyzed are in fact 
deleterious. Some of our predicted functional variants could 
be mischaracterized and others missed, together with all 
functional variants in non-coding regions. This could explain 
why some genes in which we found no variants show strong 
linkage scores and correlations. Finally, as we noted it is more 
than likely that the conservatively selected set of NSP genes is 
incomplete or inaccurate. For example, NRG3 perhaps should 
be examined independently from NRG1. 

Overall, in this work we explore the NSP and find that it 
shows aggregation of potentially damaging genetic variation 
in a subset of SZ patients and families. This result leads us to 
propose a model of pathway clustering, where multiple hits in 
one of many possible pathways explains SZ in a subset of 
cases. Exploring this possibility could lead to a much better 
understanding of disease risk and genetic heterogeneity, and 
also provide a framework for more successful analytical 
approaches. 
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